Automatic service recovery has one failover action, three restart actions, and one shutdown action. Each RSM is expected to restart any failed local service via one of the following restart actions. A series of restart actions constitutes a recovery strategy.
| Action | Description |
|---|---|
|
Failover service
|
Note: This action is only available for selection when the RSM that owns the service is redundant. See CygNet Redundancy for more information. The RSM will send a message to the RSM that owns the service on the local standby service to perform a hard failover for this service only. The action will not be complete until the hard failover is initiated by the other RSM. In case of failure, it will keep trying with a short wait between attempts. The most obvious reason for failure is if the other RSM is performing a different failover. This action will fail if the parent RSM is not redundant, or if the service does not have a clone running on the local standby domain. Note: The failover action is only taken when the active site fails. If a standby site fails, the failover action is skipped since the system would not perform a failover to recover a standby service. |
|
Restart
|
The RSM attempts to restart the service. This is the default action. Note: The recommended restart action is service specific. The ARS, CAS, CVS, and UIS will first archive service files (if configured) to an archive folder and then restart the service. All other services (DBS, FMS, and VHS) will directly restart the service without archiving files. |
|
Restore backup and restart
|
The RSM copies the service’s backup files to the service folder and then attempts to restart the service. Only backups less than two days old will be restored. One limitation to using this action is that the automatic backup is performed once daily. If new data has been added to the service since the backup, you will lose this data. This action is not recommended for DBS-based services; use Restore backup with changes and restart instead. Note: For the MSS, if an intra-day backup is configured in addition to the automatic one defined in the service configuration file, the restore backup will use the intra-day backup files. |
|
Restore backup with changes and restart
|
The RSM copies the service’s backup files to the service folder, plus any database changes (in transaction log files) since the last full or incremental backup, and then attempts to restart the service. Note: This action applies only to DBS-based services. If selected for non-DBS-based services, the Restore backup and restart action will be used. Example If the Device Definition Service (DDS) was backed-up at 2:00 a.m. and new devices were added between 8:00-10:00 a.m., and you perform a Restore backup with changes and restart at 11:00 a.m., then the new devices will be included in the restore. |
|
Shutdown RSM |
Gracefully shuts down the RSM and all of its services. See Backup and Restore for further information. |
There are two file archive options that can be used with the restart actions. These options are file-copy only options. They do not initiate an action. They must be enabled in conjunction with a restart action.
You can specify a minimum runtime that must be met for a recovery to be considered successful. If the minimum runtime is not met and you have defined a multi-stage recovery, it will move to the next stage. If the minimum runtime is met, the service is considered "running." A subsequent failure results in the recovery restarting at the first stage.
You can specify a backup directory that can be different from the one specified in the service configuration file.
You can manually invoke any recovery stage by selecting a stage and clicking Execute Selected Stage. This option allows for manual failover, restarts, archive, restoration of backups, or shutdown stages to be invoked. Logging occurs in the ELS to display the service recovery actions that were invoked and the stage.
You can specify a command to execute after a successful recovery (see Adding a Recovery Success Command) or a stage user command (see Recovery Stage Properties).
Although you can configure a multiple-stage recovery, CygNet Software recommends a single-stage recovery that uses the Restart action only, with possibly the Archive file option enabled. A single-stage recovery forces you to analyze the failure if the service will not restart. A single-stage recovery does not assume that the database is the reason the service failed, which may or may not be true. In addition, archiving the service files saves the .log files, which may contain information that can be used to decipher the reason for the service failure. Note that archiving service files on a large system could take several minutes, which will slow down recovery.
See Sample Multiple-Stage Recovery for information about multi-stage recovery.
The recommended recovery method differs for the type of CygNet Software service being recovered. See each of the following topics for recommended recovery methods:
Automatic service recovery options within the RSM can be retrieved and modified via the .NET CygNet.API.ServiceManager.